When you are working on plain text, the operations are dead simple. There could be insertions
- ins(character, index) or deletions - del(index, number_of_characters) and that is all.
Most CRDT literature take this as a classic example and show how their algorithm satisfies
eventuality constraints with this simple data model. When you bring in formatting the
complexity goes one step further where the algorithm has to deal with different formatting
options and generate different sensible outcomes tailored for each case. With formattings we
are now dealing with two dimensional boundaries
applyFormat(formatType, startIndex, endIndex)
When we move on to block elements, we are now making the content of the document into a
tree (like how most UI representations are trees). To be specific, the document is now a proper
tree with flat character arrays existing at the leaf nodes. Imagine DOM.
Suddenly the complexity of CRDT that we should come up with has increased multi-fold. By
now it has to worry about node additions, node deletions, node movements - all these have to
play smoothly with the existing character array and formatting operations.
To put it in perspective, the algorithm might end up having to compose a delete character
operation, a node addition operation and a formatting operation - all landing concurrently
with overlapping locations and somehow the algorithm has to figure out how to resolve this
with the most sensible outcome - all while keeping your rich-text tree semantically correct- i.e
renderable by the rendering program. For example imagine a user splitting a cell into two, and
another user concurrently deleting the entire column the cell was placed in. This should never
result in a tree (table) with a cell hanging out of a column. That is technically not renderable.
Talking about trees with certain grammar, does it ring any bells? Yes, ASTs! Almost all abstract
syntax trees of programming languages are trees that follow a specification or a grammar. If
they don’t fall under that grammar it most likely is a syntax error of the source text.